SQL server 2012 : T-SQL Enhancements - Windowing (OVER Clause) Enhancements

7/15/2013 9:18:20 PM

The first windowing capabilities appeared in SQL Server 2005 with the introduction of the OVER clause and a set of four ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. In our discussion, the term “window” refers to the scope of visibility from one row in a result set relative to neighboring rows in the same result set. By default, OVER produces a single window over the entire result set, but its associated PARTITION BY clause lets you divide the result set up into multiple groups, each contained inside their own window. The row sequence within each window is determined by an associated ORDER BY clause, and based on this sequence, the ranking functions assign an accumulating value to the rows in the window.

In addition to the ranking functions, the OVER clause can be used with the traditional aggregate functions SUM, COUNT, MIN, MAX, and AVG. When doing so, you do not specify the GROUP BY clause that’s normally required with the aggregate functions. Instead, each row calculates an aggregation based on the window of rows defined with OVER, optionally grouped using PARTITION BY. This is certainly useful, because it allows you to obtain aggregations without being forced to consolidate (and lose) detail rows with a GROUP BY clause. But unfortunately (until now), the aggregate functions could not also use ORDER BY in the OVER clause (as is required when using OVER with the ranking functions), making it impossible to calculate cumulative aggregations at the row level within each window. For example, you could use AVG with OVER (and, optionally PARTITION BY), but without an associated ORDER BY, there is no designated sequence to the rows in each window, making it impossible for SQL Server to compute a running average from one row to the next within the window. Thus, the best that AVG with OVER could do is compute the average for all the rows in the window (independent of row sequence), and then return that value for every row.

SQL Server 2012 finally addresses this shortcoming. In the following code samples, you’ll see how OVER/ORDER BY can now be used with all the traditional aggregate functions to provide running calculations within ordered windows. You’ll also learn how to frame windows using the ROWS and RANGE clause, which adjusts the size and scope of the window to enable sliding calculations. And finally, SQL Server 2012 introduces eight new analytic functions (covered in the next section) that are designed specifically to work with ordered (and optionally partitioned) windows using OVER with ORDER BY (and optionally PARTITION BY).

Note

The code in Example 1 creates a table populated with financial transactions from several different accounts. Tangentially, note the use of the new DATEFROMPARTS function (also covered in the next section), which is used to construct a date value from year, month, and day parameters.

Example 1. Preparing sample transaction data for querying with window functions.

CREATE TABLE TxnData (AcctId int, TxnDate date, Amount decimal)
GO
INSERT INTO TxnData (AcctId, TxnDate, Amount) VALUES
  (1, DATEFROMPARTS(2012, 4, 10), 500),  -- 5 transactions for acct 1
  (1, DATEFROMPARTS(2012, 4, 22), 250),
  (1, DATEFROMPARTS(2012, 4, 24), 75),
  (1, DATEFROMPARTS(2012, 4, 26), 125),
  (1, DATEFROMPARTS(2012, 4, 28), 175),
  (2, DATEFROMPARTS(2012, 4, 11), 500),  -- 8 transactions for acct 2
  (2, DATEFROMPARTS(2012, 4, 15), 50),
  (2, DATEFROMPARTS(2012, 4, 22), 5000),
  (2, DATEFROMPARTS(2012, 4, 25), 550),
  (2, DATEFROMPARTS(2012, 4, 27), 105),
  (2, DATEFROMPARTS(2012, 4, 27), 95),
  (2, DATEFROMPARTS(2012, 4, 29), 100),
  (2, DATEFROMPARTS(2012, 4, 30), 2500),
  (3, DATEFROMPARTS(2012, 4, 14), 500),  -- 4 transactions for acct 3
  (3, DATEFROMPARTS(2012, 4, 15), 600),
  (3, DATEFROMPARTS(2012, 4, 22), 25),
  (3, DATEFROMPARTS(2012, 4, 23), 125)

Running Aggregations

In SQL Server 2012, an ORDER BY clause may be specified with OVER to produce running aggregations within each window, as Example 2 demonstrates:

Example 2. Using OVER with ORDER BY to produce running aggregations.

SELECT AcctId, TxnDate, Amount,
  RAvg = AVG(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RCnt = COUNT(*)    OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RMin = MIN(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RMax = MAX(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RSum = SUM(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate)
 FROM TxnData
 ORDER BY AcctId, TxnDate

AcctId TxnDate    Amount RAvg        RCnt RMin RMax RSum
------ ---------- ------ ----------- ---- ---- ---- ----
1      2012-02-10 500    500.000000  1    500  500  500
1      2012-02-22 250    375.000000  2    250  500  750
1      2012-02-24 75     275.000000  3    75   500  825
1      2012-02-26 125    237.500000  4    75   500  950
1      2012-02-28 175    225.000000  5    75   500  1125
2      2012-02-11 500    500.000000  1    500  500  500
2      2012-02-15 50     275.000000  2    50   500  550
2      2012-02-22 5000   1850.000000 3    50   5000 5550
  :

The results of this query are partitioned (windowed) by account. Within each window, the account’s running averages, counts, minimum/maximum values, and sums are ordered by transaction date, showing the chronologically accumulated values for each account. No ROWS clause is specified (we’ll explain how to use the ROWS clause next), so ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is assumed by default. This yields a window frame size that spans from the beginning of the partition (the first row of each account) through the current row. When the account ID changes, the previous window is “closed” and new calculations start running for a new window over the next account ID.

Sliding Aggregations

You can also narrow each account’s window by framing it with a ROWS clause in the OVER clause. This enables sliding calculations, as demonstrated in Example 3:

Example 3. Using OVER with ORDER BY and PRECEDING to produce sliding aggregations.

SELECT AcctId, TxnDate, Amount,
  SAvg = AVG(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate
                            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),
  SCnt = COUNT(*)    OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
  SMin = MIN(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
  SMax = MAX(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
  SSum = SUM(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING)
 FROM TxnData
 ORDER BY AcctId, TxnDate

AcctId TxnDate    Amount SAvg        SCnt SMin SMax SSum
------ ---------- ------ ----------- ---- ---- ---- ----
1      2012-02-10 500    500.000000  1    500  500  500
1      2012-02-22 250    375.000000  2    250  500  750
1      2012-02-24 75     275.000000  3    75   500  825
1      2012-02-26 125    150.000000  3    75   250  450
1      2012-02-28 175    125.000000  3    75   175  375
2      2012-02-11 500    500.000000  1    500  500  500
2      2012-02-15 50     275.000000  2    50   500  550
2      2012-02-22 5000   1850.000000 3    50   5000 5550
  :

This slightly modified version of the previous query specifies ROWS BETWEEN 2 PRECEDING AND CURRENT ROW in the OVER clause for the RAvg column, overriding the default window size. Specifically, it frames the window within each account’s partition to a maximum of three rows: the current row, the row before it, and one more row before that one. Once the window expands to three rows, it stops growing and starts sliding down the subsequent rows until a new partition (the next account) is encountered. The BETWEEN…AND CURRENT ROW keywords that specify the upper bound of the window are assumed default, so to reduce code clutter, the other column definitions specify just the lower bound of the window with the shorter variation ROWS 2 PRECEDING.

Notice how the window “slides” within each account. For example, the sliding maximum for account 1 drops from 500 to 250 in the fourth row, because 250 is the largest value in the window of three rows that begins two rows earlier—and the 500 from the very first row is no longer visible in that window. Similarly, the sliding sum for each account is based on the defined window. Thus, the sliding sum of 375 on the last row of account 1 is the total sum of that row (175) plus the two preceding rows (75 + 125) only—not the total sum for all transactions in the entire account, as the running sum had calculated.

Using RANGE versus ROWS

Finally, RANGE can be used instead of ROWS to handle “ties” within a window. Although ROWS treats each row in the window distinctly, RANGE will merge rows containing duplicate ORDER BY values, as demonstrated by Example 4:

Example 4. Comparing ROWS and RANGE for calculating window functions.

SELECT AcctId, TxnDate, Amount,
  SumByRows    = SUM(Amount) OVER (ORDER BY TxnDate ROWS UNBOUNDED PRECEDING),
  SumByRange   = SUM(Amount) OVER (ORDER BY TxnDate RANGE UNBOUNDED PRECEDING)
 FROM TxnData
 WHERE AcctId = 2
 ORDER BY TxnDate

AcctId  TxnDate     Amount  SumByRows  SumByRange
------  ----------  ------  ---------  ----------
2       2012-02-11  500     500        500
2       2012-02-15  50      550        550
2       2012-02-22  5000    5550       5550
2       2012-02-25  550     6100       6100
2       2012-02-27  105     6205       6300
2       2012-02-27  95      6300       6300
2       2012-02-29  100     6400       6400
2       2012-02-30  2500    8900       8900

In this result set, ROWS and RANGE both return the same values, with the exception of the fifth row. Because the fifth and sixth rows are both tied for the same date (2/27/2012), RANGE returns the combined running sum for both rows. The seventh row (for 2/29/2012) breaks the tie, and ROWS “catches up” with RANGE to return running totals for the rest of the window.